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Abstract 

Compressed Sensing (CS) is an appealing framework for applications such as Magnetic 
Resonance Imaging (MRI). However, up-to-date, the sensing schemes suggested by CS the¬ 
ories are made of random isolated measurements, which are usually incompatible with the 
physics of acquisition. To reflect the physical constraints of the imaging device, we introduce 
the notion of blocks of measurements: the sensing scheme is not a set of isolated measure¬ 
ments anymore, but a set of groups of measurements which may represent any arbitrary 
shape (parallel or radial lines for instance). Structured acquisition with blocks of measure¬ 
ments are easy to implement, and provide good reconstruction results in practice. However, 
very few results exist on the theoretical guarantees of CS reconstructions in this setting. In 
this paper, we derive new CS results for structured acquisitions and signals satisfying a prior 
structured sparsity. The obtained results provide a recovery probability of sparse vectors 
that explicitly depends on their support. Our results are thus support-dependent and offer 
the possibility for flexible assumptions on the sparsity structure. Moreover, the results are 
drawing-dependent, since we highlight an explicit dependency between the probability of re¬ 
constructing a sparse vector and the way of choosing the blocks of measurements. Numerical 
simulations show that the proposed theory is faithful to experimental observations. 

Key-words: Compressed Sensing, blocks of measurements, structured sparsity, MRI, exact 
recovery, t\ minimization. 

1 Introduction 

Since its introduction in | ICRT06b[ IDon06j , compressed sensing triggered a massive interest in 
fundamental and applied research. However, despite recent progresses, existing theories are still 
insufficient to explain the success of compressed acquisitions in many practical applications. Our 
aim in this paper is to extend the applicability of the theory by combining two new ingredients: 
structured sparsity and acquisition structured by blocks. 

1.1 A brief history of compressed sensing 

In this section, we provide a brief history of the compressed sensing evolution, with a particular 
emphasis on Fourier imaging, in order to better highlight our contribution. 
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1.1.1 Sampling with matrices with i.i.d. entries 


Compressed sensing - as proposed in [CT06] - consists in recovering a signal x £ C n , from a 
vector of measurements y = Ax, where A € C mxn is the sensing matrix. Typical theorems state 
that if A is an i.i.d. Gaussian matrix, x is s-sparse, and m > slog(n), then x can be recovered 
exactly from y by solving the following i\ minimization problem: 

min II cell i. (1) 

x£C n ,Ax=y 

Moreover, it can be shown that the recovery is robust to noise if the constraint in 0 is penalized. 
An important fact about this theorem is that the number of measurements mostly depends on 
the intrinsic dimension s rather than the ambient dimension n. 


1.1.2 Uniform sampling from incoherent bases 

Nearly at the same time, the theory was extended to random linear projections from orthogonal 
bases CRTOGb . IRaulOl ICPlll |FR13| . Let Ao € C nxn denote an orthogonal matrix with rows 
(a*)i<i<n £ C". A sensing matrix A can be constructed by randomly drawing rows as follows 


A = 




l<£<m 


( 2 ) 


where are i.i.d. copies of a uniform random variable J with P(J = j) = iTj = 1/n, for 

all 1 < j < n. The coherence of matrix Ao can be defined by 

k(Aq) =n ■ max ||ad|;L. 

l<i<n 


A typical result in this setting states that if m > k(Ao)s ln(n/e) then an s-sparse vector x 
can be exactly recovered using the -minimization problem 0 with probability exceeding 1 — e. 
This type of theorem is particularly helpful to explain the success of recovery of sparse signals 
(spikes) from Fourier measurements, since in that case k(Aq) = 1. 


1.1.3 The emergence of variable density sampling 


Unfortunately, in most applications, the sensing matrix Ao is coherent, meaning that k(Ao) is 
large. In pratice, uniformly drawn measurements lead to very poor reconstructions. A natural 
idea to reduce the coherence consists in drawing the highly coherent rows of Ao more often than 
the others. 

A byproduct of standard compressed sensing results implies that variable density 

sampling [PVWlll 1CCW131 K\V 1 1 allows perfect reconstruction with a limited (but usually 
too high) number of measurements. This idea is captured by the following result. 

Let Aq £ C nxn denote an orthogonal matrix with rows (a*)i<i< n £ C n . Let A denote the 


random matrix 


A 




l<i<m 


(3) 


where {Jt)\<i<m are co Pi es °f a random variable J with P(J = j ) = -Kj = for 

all 1 < j < n. 

Let x denote an s-sparse vector and set m > 1111§o^ sln(n/e). Then, the minimizer 

of |l]) coincides with x, with probability larger than 1 — e. 

Unfortunately, it is quite easy to show experimentally, that this principle cannot explain the 
success of CS in applications such as Fourier imaging. The flip test proposed in [ AHPR13 ] is a 
striking illustration of this fact. 
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1.1.4 Variable density sampling with structured sparsity 

A common aspect of the above results is that they assume no structure - apart from sparsity 
- in the signals to recover. Recovering arbitrary sparse vectors is a very demanding property 
that precludes the use of CS in many practical settings. Exact recovery conditions for sparse 
vectors with a structured support appeared quite early, with the work of Tropp |Tro06] . To 
the best of our knowledge, the work [AHPR13] is the first to provide explicit constructions 
of random matrices allowing to recover sparse signals with a structured support. The theory 
in [AHPR13j . also suggests variable density sampling strategies. There is however one major 
difference compared to the previously mentioned contributions: the density should depend both 
on the sensing basis and the sparsity structure. The authors develop a comprehensive theory 
for Fourier sampling, based on isolated measurements under a sparsity-by-levels assumption 
in the wavelet domain. They illustrate through extensive numerical experiments in |AHR14b] 
that sampling structured signals in coherent bases can significantly outperform i.i.d. Gaussian 
measurements - usually considered as an optimal sampling strategy. This theory will be reviewed 
and compared to ours in Section |4j 

1.1.5 An example in MRI 

To fix the ideas, let us illustrate the application of the previously described theory in the context 
of Magnetic Resonance Imaging (MRI). In MRI, images are sampled in the Fourier domain and 
can be assumed to be sparse in the wavelet domain. Figure 0 (a) illustrates a variable density 
sampling pattern: the white dots indicate which Fourier coefficients are probed. Figure @ (b) 
is the reconstruction of a phantom image from the measurements in (a) via ^-minimization. 
Figure 0 (c) is a zoom on the reconstruction. As can be seen, only 4.6% of the coefficients are 
enough to reconstruct a well resolved image. 

1.2 The need for new results 

Probing measurements independently at random is infeasible - or at least impractical - in most 
measuring instruments. This is the case in MRI, where the samples have to lie on piecewise 
smooth trajectories [LDP071 fCCKW14l ICWKClfij . The same situation occurs in a number of 
other devices such as Electron [LSMH13] and X-ray Tomography jPSV09] , radio-interferometry 
[WJP+09| . mobile sensing [TH08j . ... As a result, concrete applications of CS often rely on 
sampling schemes that strongly deviate from theory, to account for physical constraints intrinsic 
to each instrument. Despite having no solid theoretical foundation, these heuristic strategies 
work very well in practice. 

This fact is illustrated in Figure [2j In this numerical experiment, parallel lines drawn inde- 
pently at random generate a very structured sampling pattern in the Fourier domain, see Figure 
[2] (a). As can be seen in Figure [ 2 ] (b) and (c), this highly structured pattern makes it possible 
to recover well resolved images using an 7 1 -minimization reconstruction. 

To the best of our knowledge, there currently exists no theory able to explain this favorable 
behavior. The only works dealing with such an acquisition are |PDG15l IBBW14] , They assume 
no structure in the sparsity and we showed in |BBW14] that structure was crucially needed to 
explain results such as those in Figure [2] We will recall this result in Section 

1.3 Contributions 

The main contribution of this paper is to derive a new compressed sensing theory: 

(i) giving recovery guarantees with an explicit dependency on the support of the vector to 
reconstruct, 

(ii) based on block-structured acquisition. 


4.3.1 
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(a) 


(d) SNR = 21 dB (e) 

Figure 1: An example of reconstruction of a 2048 x 2048 MR image from isolated measurements, 
(a) Sensing pattern from a variable density sampling strategy (with 4.6% measurements), (b) 
Corresponding reconstruction via % -minimization, (c) A zoom on a part of the reconstructed 
image, (d) Image obtained by using the pseudo-inverse transform, (e) A zoom on a part of this 
image. 



Informally, our main result (Theorem 3.3) reads as follows. Let x E C n denote a vector with 
support S C {1,... , n}. Draw m blocks of measurements with a distribution 7r £ , where M 

denotes the number of available blocks. If m > r(S', ir) In (j), the vector x is recovered by l 1 
minimization with probability greater than 1 — e. 

The proposed theory has a few important consequences: 


• The block structure proposed herein enriches the family of sensing matrices available for 
CS. Existing theories for structured sampling do not take constraints of the sampling device 
into account. Therefore, the proposed theory gives keys to design realistic structured 
sampling. 

• Our theorem significantly departs from most works that consider reconstruction of any 
s-sparse vector. This is similar in spirit to the works [ AH 15 :, IA111 * H 1.31 However, this is 
the first time that the dependency on the support S and the drawing probability ir is made 
explicit through the quantity T(S,ir). This provides many possibilities such as optimizing 
the drawing probability ir or identifying the classes of supports recoverable with block 
sampling strategies. 

• The proposed approach generalizes most existing compressed sensing theories. In partic¬ 
ular, it allows recovering all the results mentioned in the introduction. 

• The provided theory seems to predict accurately practical Fourier sampling experiments, 
which is quite rare in this field. The example given in Figure [2] can be analyzed precisely. 
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Figure 2: An example of reconstruction of a 2048 x 2048 MR image from blocks of measurements, 
(a) Sampling pattern horizontal lines (13% of measurements), (b) Corresponding reconstruction 
via fq -minimization. (c) A zoom on a part of the reconstructed image, (d) Image obtained by 
using the pseudo-inverse transform, (e) A zoom on a part of this image. 


In particular, we show that a block structured acquisition can be used, only if the support 
structure is adapted to it. The resulting structures are more complex than the sparsity by 
levels of |AHPR13| . 

• The proposed theory allows envisioning the use of CS in situations that were not possible 
before. The use of incoherent transforms is not necessary anymore, given that the support 
S has some favorable properties. 

• The usual restricted isometry constant or coherence are replaced by the quantity T(S', 7r), 
which seems to be much more adapted to describe the practical success of CS. 

1.4 Related notions in the literature 

In this paper, structured acquisition denotes the constraints imposed by the physics of the 
acquisition, that are modeled using blocks of measurements extracted from a full deterministic 
matrix Aq. This notion of structured acquisition differs from the notion of structured random 
matrices, as described in [Rau m and jPF4 1 j . Indeed, this latter strategy is based on acquiring 
isolated measurements randomly drawn from the rows of a deterministic matrix. The resulting 
sensing matrix has thus some inherent structure, which is not the case of random matrices with 
i.i.d. entries, that were initially considered in CS. In our paper, the sensing matrix A is even 
more structured, in the sense that the full sampling matrix Ao has been partitioned into blocks 
of measurements. 

We also focus on obtaining RIPless results by combining structured acquisition and struc¬ 
tured sparsity. RIPless results jCPlij refer to CS approaches that are non-uniform in the sense 


5 



















that they hold for a given sensing matrix A and a given support S of length s, but not for all 
s-sparse vectors. Nevertheless, existing RIPless results in the literature are only based on the 
degree of sparsity s = |<Sj. A main novelty of this paper is to develop RIPless results that ex¬ 
plicitly depend on the support S (and not only on its cardinality s ) of the signal to reconstruct. 
This strategy allows to incorporate any kind of prior information on the structure of S to study 
its influence on the quality of CS reconstructions. 

Structured sparsity is a concept that appeared early in the history of compressed sensing. The 
works [Tro06l IGN081 lHSIG13j provide sufficient conditions to recover structured sparse signals 
by using orthogonal matching pursuit or basis pursuit algorithms. Similar conditions (inexact 
dual certificate) are used in our work. The main novelty and difficulty in our contribution is to 
show that very structured sampling matrices satisfy these conditions. 

Other authors |E M091lBGDH101IDEllllB.TMQ12| proposed to change the recovery algorithm, 
when a prior knowledge of structured sparsity is available. Their study is usually restricted to 
random sub-Gaussian matrices which have no structure at all. At this point, we do not know 
if better recovery guarantees could be obtained by using structured recovery algorithms with 
structured sampling. 

Finally, let us mention that a few papers recently considered the problem of mobile sampling 
|IJV13bl lUV13a. IGR.IJV14) . In these papers, the authors provide theoretical guarantees for 
the exact reconstruction of bandlimited functions in the spirit of Shannon’s sampling theorem. 
These papers thus strongly differ from our compressed sensing perspective. 

1.5 Organization of the paper 

The paper organization is as follows. Section [2] gives the formal setting of structured acquisition. 
Section [ 3 ] gives the main results, with a precise definition of T(<5, it). Applications of our main 
theorem to various settings are presented in Section |4j Technical appendices contain the proofs 
of the main results of this paper. 

2 Preliminaries 

2.1 Notation 

In this paper, n denotes the dimension of the signal to reconstruct. The notation S C {1,..., n} 
refers to the support of the signal to reconstruct. The vectors (ej) 1<i<p denote the vectors of the 
canonical basis of M rf , where d will be equal to n or y/n, depending on the context. In the sequel, 
we set Pg E M nxri to be the projection matrix onto span({ej,i E S}), i.e. the diagonal matrix 
with the j-th diagonal entry equal to 1 if j E S, and 0 otherwise. We will use the shorthand 
notation Mg E C nxn and vg E C n to denote the matrix MPg and the vector Pgv for M E C nxn 
and v E C n . Similarly, if M^ denotes a matrix indexed by k, then Mj. g = Mj~Pg. For any 
matrix M, for any 1 < p, q < 00 , the operator norm ||M|| p _^ 9 is defined as 

\\M\\ p ^ q = sup ||Mu|| g , 

IMIp<i 

with || • ||p and || • || g denoting the standard i v and t q norms. Note that for a matrix M E R nxn , 

||M||oo-kx> = max ||e*M||i. 

l<i<n 

The function sign : M n —> M n is defined by 

f 1 if Xi > 0 

(sign(x))j = < -1 if Xi < 0 
[0 if Xi = 0, 

and Id n will denote the n-dimensional identity matrix. 
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2.2 Sampling strategy 

In this paper, we assume that we are given some orthogonal matrix Aq 6 C nxn representing the 
set of possible linear measurements imposed by a specific sensor device. Let (Ik)\<k<M denote a 
partition of the set {1,..., n}. The rows (a*)i<i< n £ C n of Aq are partitioned into the following 
blocks dictionary ( B k ) 1<k<M , such that 


B k 


K 


'iei k 


£ 




s.t. I k C n}, 


with u¥ =1 I k = The sensing matrix A is then constructed by randomly drawing 

blocks as follows 


-B k . 


A = 

/ l<i<m 

where (Ke)i<e< m are i-i.d. copies of a random variable K such that 


(4) 


P(/i = k) = 7Tfc, 


for all 1 < k < M. Moreover, thanks to the renormalization of the blocks Bx t by the weights 
1 / .yji tk c in model Q , the random block Bk satisfies 


E 


B* k B k 

TTr 


M 


k=1 


Id, 


(5) 


since Aq is orthogonal and ( Bk) 1<k<M is a partition of the rows of Ao- 

Remark 2.1. The case of overlapping blocks can also be handled. To do so, we may define the 
blocks (Bk)i <k<M as follows: 


Bk = 


/at; 


for 1 < k < M, 


iei k 


M 


where [J Z k = {!,...,n}. The coefficients (oti) 


k=1 


namely the number of appearances oti = \{k,i € Ik}\ of this row in different blocks. This 

( B* B 

renormalization is sufficient to ensure the isotropy condition E ' K 
as above. 


i <i< n denotes the multiplicity of the row a*, 

| of this row in 

/ D* \ 

= Id where K is defined 


ir K 


Note that our block sampling strategy encompasses the standard acquisition based on isolated 
measurements. Indeed, isolated measurements can be considered as blocks of measurements 
consisting of only one row of Aq. 


Remark 2.2. More generally, the theorems could be extended - with slight adaptations - to the 
case where the sensinq matrix is 

(B Kl ' 


A = 


1 


\B Kn 


where Bk x , • • •, BK m are i.i.d. copies of a random matrix B e c bxn satisfying 


E (B*B) = Id. 


The integer b is itself random and Id is the nxn identity matrix. Assuming that B takes its value 
in a countable family (Bk) k£lc , this formalism covers a large number of applications described in 
\BBWlff : (i) blocks with i.i.d. entries, (ii) partition of the rows of orthogonal transforms, (Hi) 
cover of the rows of orthogonal transforms, (iv) cover of the rows from tight frames. 
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3 Main Results 


3.1 Fundamental quantities 


Before introducing our main results, we need to define some quantities (reminiscent of the 
coherence) that will play a key role in our analysis. 


Definition 3.1. Consider a blocks dictionary (B k ) 1<k<M - Let 5 C {l,...,n} and tt be a 
probability distribution on {1 ,,M}. Define 


0(5, vr) 


T(5,tt) 

r( 5 , 7 t) 


i 

max — 
1 <k<M 7Tfc 


B* k B k , s 


oo—>oo 


max max 
1 <k<M 1 <i<n 


■BIB, 


fc,S||l 




max sup V —| e* B* k B k)S v | 2 , 
1 -*- n M«<i nk 


max (T(5,7r), 0(5,7r)). 


( 6 ) 

(7) 

( 8 ) 


For the sake of readability, we will sometimes use the shorter notation 0, T and T to denote 
0(5,7r), T(5,7r) and T(5,7r). In Definition 3.1 0 is related to the local coherence and the degree 


of sparsity, when the blocks are made of only one row (the case of isolated measurements). 
Indeed, in such a case, 0 reads as follows 


0(5,7r) := max 

l<k<n 


I || oo || &k,S || 1 
TTfc 


< S 


max 

l<fc<n 


U II2 

\ a k Hop 


The quantity maxi<fc< n ^ a ^°° refers to the usual notion of coherence described in [ CPU ]. The 
quantity T is new and it is more delicate to interpret. It reflects an inter-block coherence. A 
rough upper-bound for T is 


T(5,7r) < 


S T 7 WBkBkMllo^oc ■ 

k =l nk 


by switching the maximum and supremum with the sum in the definition of T. However, it is 
important to keep this order (maximum, supremum and sum) to measure interferences between 
blocks. In Section[4j we give more precise evaluations of 0(5,7r) and T(5,7r) in particular cases. 


Remark 3.2 (Support-dependency and drawing-dependency). In Definition 3.1 the quantities 
0 and T are drawing-dependent and support-dependent. Indeed, T does not only depend on the 
degree of sparsity s = | 5 |. To the best of our knowledge, existing theories in CS only rely on 
s, see iCRTOGai \CPll\j . or on degrees of sparsity structured by levels, see \AHPR.13j . Since 
T is explicitly related to S, this allows to incorporate prior assumptions on the structure of S. 
Besides, the dependency on tt (i.e. the way of drawing the measurements) is also explicit in the 
definition ofT. This offers the flexibility to analyze the influence of tt on the required number of 
measurements. We therefore believe that the introduced quantities might play an important role 
in the future analysis of CS. 


3.2 Exact recovery guarantees 

Our main result reads as follows. 

Theorem 3.3. Let S C {1,... ,n} be a set of indices of cardinality s > 16 and suppose that 
x € C n is an s-sparse vector supported on S. Fix e G (0,1). Suppose that the sampling matrix 
A is constructed as in ©• Suppose that T(5, tt) > 1. If 

m> 73 • T(5,7r) ln(64s) An ^ + lnln(64s)^ , (9) 

then x is the unique solution of (jTj) with probability larger than 1 — e. 
















Remark 3.4. In the sequel, we will simplify condition © by writing: 

m> C ■ T(S, 7 r) ln(s) In j 

where C is a universal constant. 


The proof of Theorem |3.3| is contained in Appendi> jA.l| It relies on the construction of an 

Then 


inexact dual certificate satisfying appropriate properties that are described in Lemma A.l 


our proof is based on the so-called golfing scheme introduced in |Gro 11. for matrix completion 
and adapted by [CPU] for compressed sensing from isolated measurements. In the golfing 
scheme, the main difficulty is to control operator norms of random matrices extracted from 
the sensing matrix A. In [ CPU ], it is proposed to control (in probability) the operator norms 
|| • ||oo —>2 and || • || 2 —s- 2 • However, this technique only gives results depending on the degree of 
sparsity s. In order to include an explicit dependency on the support S , one has to modify the 
golfing scheme in |CPllj . by controlling the operator norm || • ||oo->oo) instead of controlling the 
operator norms || • ||oo-s-2 and || • H2—>-2- A similar idea has been developed in AHPR13 . 

Remark 3.5. Compared to most compressed sensing results, the condition required in Theorem 

3.3 involves the extra multiplicative factor ln(64s). This factor does not appear in IGro 771 I77PI7]/ . 


but this is due to a mistake that was detected and corrected in f AH 15 1. Following the proofs 
proposed in ! AH 151 . we could in fact obtain a bound of type: 


m> C' ■ r(5, 7r) In ^ , 


with C > C . To the best of our knowledge, the ratio C'/C obtained using the proof in . 1 II15 
is of order 24. This means that the new bound becomes interesting only for s > 4 ■ 10 s , i.e. 
in an asymptotic regime. In this paper, we therefore stick to the bound in Theorem 3.3 for 
(i) simplifying the proof of the main result and (ii) obtain the best results in a non asymptotic 
regime. 

3.3 Consequences for stochastic signal models 

The explicit dependency of T in S allows us to consider the case of a random support S. 

Proposition 3.6. Let S C {l,...,n} denote a random support. For some real positive 7, 
suppose that the event T(S,ir) < 7 occurs with probability larger than 1 — £ , ( r y). If m > 
7 ln(s) ln(n/e), then x is the unique solution of Problem [T] with probability higher than 1 — 
e-ee , (7). 

Proof. Set m > 7 ln(s) ln(n/e). Define the event R “x is the unique solution of Problem [Tl’ 
where R stands for “reconstruction of the signal”. Define also A the event i T(S', n) > 7”. The 


hypothesis of Proposition 3.6 and Theorem 3.3 give that P (R|A) > 1 — e. To prove Proposition 
3.6[ we must quantify 


P (R) = P (R n A) + P (R n A c ) = P (R\A) P(A) +P(Rn A c ) 

> (! — <0 (! - e'(7)) = 1 - e - ee'in), 

which concludes the proof. ■ 

3.4 Choice of the drawing probability 

The choice of a drawing probability 7 r minimizing the required number of block measurements 
in Theorem 3.3, is a delicate issue. The distribution 7 r* minimizing Q(S,tt) in Equation ([b]) can 
be obtained explicitly: 


TR = 




EiSi I \b;b,, 


for 1 < k < M. 


( 10 ) 


S 11 00—>00 
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Unfortunately, the minimization of T (S, ir) with respect to it seems much more involved and we 
leave this issue as an open question in the general case. 

Note however that in all the examples treated in the paper, we derive upper bounds depending 
on (S,tt) for T(S, n) and ©(S', tt) that coincide. The distribution it * is then set to minimize the 
latter upper bound. 

Note also that optimizing 7 r independently of S will result in a sole dependence to the degree 
of sparsity s = |S| which is not desirable if one wants to exploit structured sparsity. 


4 Applications 


In this section, we first show that Theorem 3.3 can be used to recover state of the art results in 


the case of isolated measurements jCPllj . We then show that it allows recovering recent results 
when a prior on the sparsity structure is available. The proposed setting however applies to a 
wider setting even in the case of isolated measurements. Finally, we illustrate the consequences 
of our results when the acquisition is constrained by blocks of measurements. In the latter 
case, we show that the sparsity structure should be adapted to the sampling structure for exact 
recovery. 


4.1 Isolated measurements with arbitrary support 

First, we focus on an acquisition based on isolated measurements which is the most widespread 
in CS. This case corresponds to choose blocks of form B & = a* k for 1 < k < n with M = n, 
where a* k are the rows of an orthogonal matrix. In such a setting, the sensing matrix can be 
written as follows 


A = 


1 


1 


m V \Z 7 l K l / ] <£< m 


( 11 ) 


where (Kf) 1<e<m are i.i.d. copies of K such that P ( K = k ) = 7 r^, for 1 < k < n. 


We apply Theorem 3.3 when only the degree of sparsity s of the signal to reconstruct is 


known. This is the setting considered in most CS papers (see e.g. jCTflbl RaulOl ICPll j). In 
this context, our main result can be rewritten as follows. 


Corollary 4.1. Let S C {1,..., n} be a set of indices of cardinality s and suppose that x 6 C n 
is an s-sparse vector. Fix e 6 (0,1). Suppose that the sampling matrix A is constructed as in 

©• If 


m 


> C 


max 

l<k<n 


112 

l a fclloo 

7T k 


ln(s) In Q , 


( 12 ) 


then x is the unique solution of 0 with probability at least 1 — e. 


Moreover, the drawing distribution minimizing (12) is tt^ = 


ll a fcll 


ELI W*tWi 


which leads to 


rn 


: C-s • 


n 

E 

k=1 


a fc|loo ln ( s ) l n (^) • 


The proof is given in Appendix D.l 


Note that Corollary 4.1 is identical to Theorem 1.1 in (CPllj up to a logarithmic factor. 
This result is usually used to explain the practical success of variable density sampling. It is the 
core of papers such as [PVW111 IKW141 fCCTCWU] . 
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4.2 Isolated measurements with structured sparsity 


When using coherent transforms, meaning that the term maxi<fc< r 


II a k II 




in Equation (12) is 


an increasing function of n, Corollary 4.1 is unsufficient to justify the use of CS in applications. 
In this section, we show that the proposed results allow justifying the use of CS even in the 
extreme case where the sensing is performed with the canonical basis. 


4.2.1 A toy example: sampling isolated measurements from the Identity matrix 
and knowing the support S 

Suppose that the signal x to reconstruct is S-sparse where S C {l,...,n} is a fixed subset. 
Consider the highly coherent case where A$ = Id. All current CS theories would give the same 
unsatisfactory conclusion: it is not possible to use CS since Aq is a perfectly coherent transform. 
Indeed, the bound on the required number of isolated measurements given by standard CS 
theories fHPTT] reads as follows 

|| e * || 2 ]_ 
m > C ■ s ■ max —i n (n/e) = C ■ s ■ max — In (n/e). 

1 <k<n 7T^ l<fc<n 7 

Without any assumption on the support S, one can choose to draw the measurements uniformly 
at random, i.e. nk = 1/ra for 1 < k < n. This particular choice leads to a required number of 
measurements of the order 

m > C ■ s ■ n In (n/e), 

which corresponds to fully sampling the acquisition space several times. 

Let us now see what conclusion can be drawn with Theorem 13.31 


Corollary 4.2. Let S C {1,..., n} of cardinality s. Suppose that x E C n is an S-sparse vector. 
Fix e E (0,1). Suppose that the sampling matrix A is constructed as in with Aq = Id. Set 
nk = for 1 < k < n where 5}- s = 1 if k E S, 0 otherwise. Suppose that 


m > C ■ s ■ ln(s) In ^ . 

then x is the unique solution of 0 with probability at least 1 — e. 

With this new result, 0{s ln(s) ln(n)) measurements are sufficient to reconstruct the signal 
via a totally coherent. The least amount of measurements necessary to recover x is of order 


0(s ln(s)), by an argument of coupon collector effect ll elOS. p.262]. Therefore, Corollary 4.2 
near-optimal up to logarithmic factors. 


is 


Proof. The result ensues from a direct evaluation of T. Indeed, 


Il e fc4, 


S ||oo—XX) 


= max 

1<2<72 


sup | (ei,e k e* ktS v) \ = sup \e* k S v\ = 4.S, 


- - IMIoo<l 


II V || oo ^ 1 


where 6k,s = 1 if k E S, 0 otherwise. Therefore 


0 = max 




l<fc<n 7Tfc 


Then, we can write that 


T(5,7r)= max sup — \ e i e k^*k,s v \ 2 = max SU P 


l<i<n 

Oi y S 
= max -. 

l<i<n 7 


k =1 


l<»<n||„|i <1 iH 


I * 12 

\ e i,S v \ 

TTi 


To conclude the proof it suffices to apply Theorem 3.3 
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4.2.2 Isolated measurements when the degree of sparsity is structured by levels 

In this part, we consider a partition of {1,, n} into levels (f2j) i=1 N C {1,..., n} such that 

|_j Qi = {1,..., ra} and |D*| = IVj. 

l <i<N 


We consider that x is S'-sparse with | S n D i | = s* for 1 < i < N meaning that restricted to the 
level Qi, the signal Pq. x is Sj-sparse. This setting is studied extensively in the recent papers 
}AHPR13l iRHAlU |BH14| . Theorem 3.3 provides the following guarantees. 


Corollary 4.3. Let S C {1, ... ,n} be a set of indices of cardinality s, such that |S n Oj| = Sj 
for 1 < i < N. Suppose that x E C n is an S-sparse vector. Fix e E (0,1). Suppose that the 
sampling matrix A is constructed as in ©■ Set 


m > 


m > 


II I 

2 ^ 1=1 


C 


max 

l<k<n 


| &k | 


C max sup 


\ 1 


E 


77 k 

1 


ln(s) In 


n 


’- 1 fe=i 7Tk 


°* a k \ 2 K, 


ln(s) In 


n 


(13) 

(14) 


then x is the unique solution of 0 with probability at least 1 — e. 


The proof of Corollary |4.3| is given in Appendix D.2.1 We show in Appendix D.2.2 that a 


simple analysis leads to results that are nearly equivalent to those in jAHPR13j . It should be 


noted that the term 


\\ a k,n e 


\\ a k || 


TT/c 


is related to the notion of local coherence defined in |AHPR13| . 
There are however a few differences making our approach potentially more interesting in the 
case of isolated measurements: 


Our paper is based on i.i.d. sampling with an arbitrary drawing distribution. This leaves a 
lot of freedom for generating sampling patterns and optimizing the probability n in order 

In contrast, the results in [AHPR13] are 
The dependency on the levels is 


to minimize the upper-bounds (13) and (14) 


based on uniform Bernoulli sampling over fixed levels, 
not explicit and it therefore seems complicated to optimize them. 

We can deal with a fixed support S, which enlarges the possibilities for structured sparsity. 


It is also possible to consider random supports as explained in Proposition 3.6 


4.2.3 Isolated measurements for the Fourier-Haar transform 

The bounds in Corollary |4.3| are rather cryptic. They have to be analyzed separately for each 
sampling strategy. To conclude the discussion on isolated measurements, we provide a practical 
example with the ID Fourier-Haar system. 

We set Aq = Tcjf, where T E C nxn is the ID Fourier transform and E C nxn is the ID in¬ 
verse wavelet transform. To simplify the notation, we assume that n = 2 J and we decompose the 
signal at the maximum level J = log 2 (n) — 1. In order to state our result, we introduce a dyadic 
partition (fof the set {1,... ,n}. We set Do = {1}, Di = {2}, D 3 = {3,4},..., Dj = 
{n/2 + 1,..., raj. We also define the function j : {1,..., raj —> {0,..., J} by j(u ) = j if u E Llj. 

Corollary 4.4. Let Sc{l,...,n} be a set of indices of cardinality s, such that |<STlDj| = Sj for 
0 < j < J. Suppose that x E C n is an s-sparse vector supported on S. Fix e E (0,1). Suppose 
that A is constructed from the Fourier-Haar transform Aq. Choose 1 r*, to be constant by level, 
i.e. 7T/j 7 V 

m > C ■ max —2~ J 2~^~ p ^ 2 s p ■ ln(s) In (—) , (15) 

— ^ p =0 
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then x is the unique solution of ([Tj) with probability at least l — e. 
In particular, the distribution minimizing (15) is 


which leads to 


2- J Ep =0 2 -\ j ~ p \ ,2 s p 

£”=1 2-m ZLo 2-W)-p\/^s, 


m 


> C 


J 

£ 

3=0 


+£^ 


-Ij-pI/2, 


V 


p=0 

p^j 


ln(a)lnQ) 


(16) 


The proof is presented in Section D.3 This corollary is once again similar to the results in 


|AHR14b] . The number of measurements in each level j should depend on the degree of sparsity 
Sj but also on the degree of sparsity of the other levels which is more and more attenuated when 
the level is far away from the j’-th one. 

Remark 4.5. The Fourier-Wavelet system is coherent and the initial compressed sensing the¬ 
ories cannot explain the success of sampling strategies with such a transform. To overcome the 
coherence, two strategies have been devised. The first one is based on variable density sampling 
(see e.g. fPMG + 1 2l CCKWlf\ KWlff ). The second one is based on variable density sampling 
and an additional structured sparsity assumption (see e.g. . 1 II PH Id and Corollary \4.4\ )- First, 

note that the results obtained with the latter approach allow recovering signal with arbitrary 
J J 


+ £2 


~\j~P 1/2 


s p <2s. 


p =0 

p^j 


supports. Indeed, Sj 
3=0 

Second, it is not clear yet - from a theoretical point of view - that the structure assump¬ 
tion allows obtaining better guarantees. Indeed, it is possible to show that the sole variable 
density sampling leads to perfect reconstruction from m oc sln(n) 2 measurements, which is on 
par with bound (16). It will become clear that structured sparsity is essential when using the 
Fourier-Wavelet systems with structured acquisition. Morever, the numerical experiments led 
in \AHR 14 b^ let no doubt about the fact that structured sparsity is essential to ensure good 
reconstruction with a low number of measurements. 


4.3 Structured acquisition and structured sparsity 


In this paragraph, we illustrate how Theorem 3.3 explains the practical success of structured 
acquisition in applications. We will mainly focus on the 2D setting: the vector x E C n to 
reconstruct can be seen as an image of size y/n x y/n. 


4.3.1 The limits of structured acquisition 

In [BBW14] IPDG15] , the authors provided theoretical CS results when using block-constrained 
acquisitions. Moreover, the results in [BBW14] are proved to be tight in many practical situa¬ 
tions. Unfortunately, the bounds on the number of blocks of measurements necessary for perfect 
reconstruction are incompatible with a faster acquisition. 

To illustrate this fact, let us recall a typical result emanating from [BBW14] , It shows that 
the recovery of sparse vectors with an arbitrary support is of little interest when sampling lines 
of tensor product transforms. This setting is widely used in imaging. It corresponds to the MRI 
sampling strategy proposed in [ LDP07 ], 

Proposition 4.6 ( }BBW14j ). Suppose that To = </> <8> (f> G C nxn is a 2D separable transform, 
where (j) E (^y/n^y/n an orthogonal transform. Consider blocks of measurements made of y/n 
horizontal lines in the 2D acquisition space, i.e. for 1 < k < y/n 
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If the number of acquired lines m is less than min(2s, y/n), then there exists no decoder A 
such that A(Ax) = x for all s-sparse vector x £ C n . 

In other words, the minimal number m of distinct blocks required to identify every s-sparse 
vectors is necessarily larger than min(2s, y/n). 


This theoretical bound is quite surprising: it seems to enter in contradiction with the practical 
results obtained in Figure [ 2 ] or with one of the most standard CS strategy in MRI [LDP07j . 
Indeed, the equivalent number of isolated measurements required by Proposition 4.6 is of the 
order 0{sy/n). This theoretical result means that in many applications, a full sampling strategy 
should be adopted, when the acquisition is structured by horizontal lines. In the next paragraphs, 
we show how Theorem 3.3 allows bridging the gap between theoretical recovery and practical 
experiments. 


4.3.2 Breaking the limits with adapted structured sparsity 

In this paragraph, we illustrate - through a simple example - that additional assumptions on 
structured sparsity is the key to explain practical results. 

Corollary 4.7. Let Aq £ C” xn y e ih e 2D Fourier transform. Assume that x is a 2D signal with 
support S concentrated on q horizontal lines of the spatial plane, i.e. 

S C {{j — l)\/n + {!,..., y/n},j £ J} (17) 


where J C {1,..., y/n} and \ J\ = q. 

Choose a uniform sampling strategy among the y/n horizontal lines, i.e. 7r£ = 1 /y/n for 1 < 
k < y/n. The number m of sampled horizontal lines sufficient to reconstruct x with probability 
1 — e is 

( Tl 
— 


The proof is given in Appendix D.4 By Proposition 4.7 we can observe that the required 
number of sampled lines is of the order of non-zero lines in the 2D signal. In comparison, 
Proposition 4.6 in jBBW14| (with no structured sparsity) requires 


m> s • In(n/e), 


measurements, to get the same guarantees. This means that the required number of horizontal 
lines to sample is of the order of the non-zero coefficients. By putting aside the logarithmic 
factors, we see that the gain with our new approach is considerable. Clearly, our strategy is able 
to take advantage of the sparsity structure of the signal of interest. 


4.3.3 Consequences for MRI sampling 

We now turn to a real MRI application. We assume that the sensing matrix Aq £ C nxn 
is the product of the 2D Fourier transform F 2 D with the inverse 2D wavelet transform 4?*. 
We aim at reconstructing a vector x £ C n that can be seen as a 2D wavelet transform with 
y/n x y/n coefficients. Set J = log 2 {y/n) — 1 and let ( T j)o<j<j denote a dyadic partition of 
the set {1,..., y/n}, i.e. r 0 = {1}, n = {2}, r 2 = {3,4},..., r j = {y/h/2 + 1,..., y/n}. Define 
j : {1,..., y/n} —> {0,..., J} by j{u) = j if u £ Tj. Finally, define the sets iltt' = rt x 7 #, for 
0 < £,£' < J. See Figure [3] for an illustration of these sets. 

Definition 4.8. Given S = supp{x), define the following quantity 

s c f := max max I S' FI Vltpi fl CiA , (18) 

0 <e'<Jk£T t ,' ’ 1 

where Ck represents the set corresponding to the k-th vertical line (see Figure [s]). 
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Figure 3: 2D view of the signal x E C n to reconstruct. The vector x can be reshaped in a y/nx yfn 
matrix. Ck represents the coefficient indexes corresponding to the k-th. vertical column. 


The quantity Sg represents the maximal sparsity of x restricted to columns (or vertical lines) 
of UiWe have now settled everything to state our result. 

As a first step, we will consider the case of Shannon’s wavelets, leading to a block-diagonal 
sampling matrix Aq. 


Corollary 4.9. Let S C {1,..., n} be a set of indices of cardinality s. Suppose that x E C n 
is an s-sparse vector supported on S. Fix e E (0,1). Suppose that Aq is the product of the 2D 
Fourier transform with the 2D inverse Shannon’s wavelets transform. Consider that the blocks of 
measurements are the y/n horizontal lines in the 2D setting. Choose (^k)i<k<y/n t 0 ^ e constant 
by level, i.e. nk = ftj(k)- If the number of horizontal lines to acquire satisfies 


m 


> 
r \.j 


max —2 J s^ln(s) In ( — V 
0 <j<J 7 Tj 3 \eJ 


then x is the unique solution of Problem 1 
leads to the following upper bound 


Furthermore, choosing fry 


E/=o s i ’ 


for 0 < j < J, 


m > j^s^ln(s)ln Q) . 

3=0 


The proof is given in Section D.5 


Corollary |4.9| shows that the number of lines acquired at 
level j depends only on an extra-column structure of S. Now let us turn to a case where the 
matrix Aq is not block-diagonal anymore. 


Corollary 4.10. Suppose that x E C n is an S-sparse vector. Fix e E (0,1). Suppose that Ao is 
the product of the 2D Fourier transform with the 2D inverse Haar transform. Consider that the 
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blocks of measurements are the sjn. horizontal lines. Choose t 0 be constant by level, 

i.6. TTfc '^j(k)‘ 

If the number m of drawn horizontal lines satisfies 


m 


> mai ——-^^2 S c r ln(s) 111 


0 <j<J 7 Tj _ 


r=0 


then x is the unique solution of Problem^ I] with probablity 1 — e. 
In particular, if 

2-i(‘)E^2-N/ 2 S f 

71 'k = 


£/=i 2 -m ^ =0 2-m-r\/2 s c 


then 


m 


> El s i+E 2_li " r|/2eC 

i=o y 


r=0 


s c r I • ln(s) In ( - 


ensures perfect reconstruction with probability 1 — e. 


The proof of Corollary 4.10 is given in Section D.6 


This result indicates that the number of acquired lines in the ’’horizontal” level j should be 
chosen depending on the quantities Sj. Note that this is very different from the sparsity by levels 


proposed in |AHPR13j . In conclusion, Corollary 4.10 reveals that with a structured acquisition, 
the sparsity needs to be more structured in order to guarantee exact recovery. To the best of our 
knowledge, this is the first theoretical result which can explain why sampling lines in MRI as in 
[ LDP07 ] might work. In Figure [4j we illustrate that the results in Corollary 4.10 seem to indeed 
correspond to the practical reality. In this experiment, we seek reconstructing a reeds image 
from block structured measurements. As a test image, we chose a reeds image with vertical 
stripes of its rotated version. This particular geometrical structure explains that the quantities 
Sj are much higher for the horizontal stripes than for the vertical one. As can be seen, the image 
with a low s^ is much better reconstructed than the one with a high Sj. This wa predicted by 
our theory. 
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Sampling scheme 



(a) Original image (b) SNR = 27.8 dB 

s c = (16, 16, 32, 59, 81, 75, 48) 


(c) Original image (d) SNR = 14.7 dB 

s c = (16, 16, 32, 64, 124, 240, 411) 




Figure 4: An example of reconstruction of a 2048 x 2048 real image sensed in the Fourier domain. 
In (a) (c), Reference images to reconstruct: (c) is the same image as (a) but rotated of 90°. We 


precise the value of the vector s c = ( s ?) for both images. Note that the quantities s c ,j 

\ J J i<i<7 J 


are 


larger in the case of image (b). For the reconstruction, we use the sampling scheme at the top of 
the Figure. It corresponds to 9.8 % of measurements. In (b) (d), corresponding reconstruction 
via 1 1 -minimization. We have rotated the image in (d) to facilitate the comparison between 


both. Note that (b) is much better reconstructed than (d). This is predicted by Corollary 4.10 
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5 Extensions 


5.1 The case of Bernoulli block sampling 

We analyzed the combination of structured acquisition and structured sparsity with i.i.d. draw¬ 
ings of random blocks. These results can be extended to a Bernoulli sampling setting. In such 
a setting, the sensing matrix is constructed as follows 


A 



1 <k<M 


where ( $k)i<k<M are independent Bernoulli random variables such that P (5k = 1 ) = vr^, for 
1 < k < M . We may set E)fc=i 77 k = in order to measure m blocks of measurements 
in expectation. By considering the same definition for r(S l , tt) with (tt &) i < jfc<M the Bernoulli 
weights, it is possible, for the case of Bernoulli block sampling, to give a reconstruction result 
that shares a similar flavor to Theorem 13.31 


5.2 Towards new sampling schemes? 


The results in Section 4.3.3| lead to the conclusion that exact recovery with structured acquisition 
can only occur if the the signal to reconstruct possesses an adequate sparsity pattern. We believe 
that the proposed theorems might help designing new efficient and feasible sampling schemes. 
Ideally, this could be done by optimizing T(5, tt) assuming that S belongs to some set of realistic 
signals. Unfortunately, this optimization seems unrealistic to perform numerically, owing to the 
huge dimensions of the objects involved. We therefore leave this question open for future works. 


However, probing the limits of a given system, as was proposed in Corollary 4.10 helps 
designing better sampling schemes. To illustrate this fact, we performed a simple experiment. 
Since the quantity s J c is critical to characterize a sampler efficiency, it is likely that mixing 
horizontal and vertical sampling lines improves the situation. We aim at reconstructing the MR 
image shown in Figure [5] and assume that it is sparse in the wavelet basis. In Figure [5](a)(d), 
we propose two different sampling schemes. The first one is based solely on parallel lines in the 
horizontal direction, while the second one is based on a combination of vertical and horizontal 
lines. The combination of vertical and horizontal lines provides much better reconstruction 
results despite a lower total number of measurements. 
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Reference image 



(d) Sampling scheme (e) SNR = 26.74 dB 


(f) 


Figure 5: An example of MRI reconstruction of a 2048 x 2048 phantom. The reference image to 
reconstruct is presented at the top of the figure. It is considered sparse in the wavelet domain. 
In (a) (d), we present two kinds of sampling schemes with 20 % of measurements: the samples 
are acquired in the 2D Fourier domain. In (b) (e), we show the corresponding reconstruction 
via -minimization. In (c) (f) we enhance the results by zooming on the reconstructed images. 
Note that the horizontal and vertical sampling scheme produces much better reconstruction 
results despite a smaller number of measurements since samples are overlapping. Moreover, the 
acquisition time would be exactly the same for an MRI. 
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A Proofs of the main results 


A.l Proof of Theorem 13.31 

In this section, we give sufficient conditions to guarantee that the vector x is the unique minimizer 
of 0 . using an inexact dual certificate see [HPTT] . 

Lemma A.l (Inexact duality jCPllj ). Suppose that x 6 M n is supported on S C {l,...,n}. 
Assume that Ag is full column rank and that 

|| (AsAs) -1 112—>-2 £ 2 and max ||AgAej|| 2 < 1, (19) 

%gS c 

where (AsAs) -1 on ^y makes sense on the set span{ei,i £ S'}. Morever, suppose that there exists 
v £ I" in the row space of A obeying 

II^S — sign(xs)|| 2 <1/4 and |bs c |l°o < 1/4, (20) 


Then, the vector x is the unique solution of the minimization problem (JT]) 


First, let us focus on Conditions (19). Remark that is invertible by assuming that Ag 

is full column-rank. Moreover, 


(AsAs) 


-1 


12 - 5-2 — 


e ( a *s a s - p *y 


k=0 


< HAsAs - p s II 

2 - 5-2 fc=0 

\ — 1 


k 

2—>2 


Therefore, if \\A* S A$ — Ps \\ 2 ^2 — is satisfied, then || (A* s Ag) 11 2—^2 < 2. Moreover, by 

Lemma 


C.l 


(A* S A S ) 


-1 


2—5-2 


< 2 with probability at least 1 — e, provided that 


28 


(2s 


m > —0(S, 7r) In ( — ) . 


By definition of T(S, 7r), the first inequality of Conditions (19) is therefore ensured with proba¬ 
bility larger than 1 — e if 

'2s 


m > ^r(S, 7 r) In 
3 


( 21 ) 


Furthermore, using Lemma C.5 we obtain that 


max ||R^Aei ||2 < 1 
i^S c 


with probability larger than 1 — e if 


m > 0(S,7r) ^l + 4^1n(”) +41n(”)^ . 


Again by definition of T(S, 7r), the second part of Conditions (20) is ensured if n > 3 and 


m > 9T(S, 7 r) In j . 


( 22 ) 


Conditions (20) remain to be verified. The rest of the proof of Theorem 3.3 relies on the 


construction of a vector v satisfying the conditions described in Lemma A.l with high probability. 
To do so, we adapt the so-called golfing scheme introduced by Gross [Grolll to our setting. More 


precisely, we will iteratively construct a vector that converges to a vector v satisfying (20) with 
high probability. 

Let us first partition the sensing matrix A into blocks of blocks so that, from now on, we 
denote by A 1 ) the first ?bi blocks of A, A 2 ) the next m 2 blocks, and so on. The L random 


20 
























matrices L are independently distributed, and we have that m = mi +7712 + . • .+ m£. 

As explained before, denotes the matrix A^Pg. 

The golfing scheme starts by defining v = 0, and then it iteratively defines 


= ^A^Af (sign(x) -t;M) + v^ t ~ 1 \ 


(23) 


for £ = 1,..., L, where sign(xj) = 0 if x\ = 0. In the rest of the proof, we set v = v^ L \ By 
construction, v is in the row space of A. The main idea of the golfing scheme is then to combine 
the results from the various Lemmas in Section 0 with an appropriate choice of L to show that 
the random vector v satisfies the assumptions of Lemma A.l with large probability. Using the 
shorthand notation = Psv^\ let us define 

= sign(x) — Vg\ £ = 1,..., L, 

where x E C” is the solution of Problem ([Tj). 

From the definition of v^\ it follows that, for any 1 < £ < L, 


w(t> =( Fs - /)4 , -4 o ) =n - ^ A f) ^ 


and 


= y —aw*a<$wV- i '>. 

mi * 


1=1 


(24) 


(25) 


Note that in particular, w ^ = sign(x) and = sign(x) — v. In what follows, it will be 
shown that the matrices Pq — — are contractions and that the norm of the vector 

° mi b b 

decreases geometrically fast with £. Therefore, becomes close to sign(xs) as £ tends to L. In 
particular, we will prove that < 1/4 for a suitable choice of L. In addition, we also show 

that v satisfies the condition ||t'sc|| 0 o < 1/4. All these conditions will be shown to be satisfied 
with a large probability (depending on e). 


For all 1 < £ < L, assume that 


w 


W 


m 

mi 


A 


^(4)*4V-> 
y Af -Ps) 


mi 


<n 

2 

(Cl-£) 

_8 

J. 

-+o 

VI 

(C2-£) 

<t' e \\w^ |oo, 

(C3-£) 


Mf) 

2 in 2 


with 

(i) L = 2 + 

(ii) r e = 4, for £ = 1,... ,L, 

(iii) te = tg = | for £ = 1,..., L. 


Note that using (Cl-£), we can write that 


L L L r-^ ^ 

11sign(xs) -xs|| 2 = *H 2 < ||sign(x 5 )|| 2 < yfs\\_re < < -, 

1 = 1 l = 1 


(26) 


where the last inequality follows from the previously specified choice on L. 
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Furthermore, Equation (C2-£) implies that 


H^sHloo — 


< 


< 


E^K) 

e =i 

E £(4?)*4V«> 


t=l 

L 




u? 


(/-I) 


£=1 
L t-1 

£X>n*i 

*=1 3=1 


\\ 1 — (1/5) L < 1 


5y 1-1/5 “4' 


(27) 


Note that in Inequality (27), the control of the operator norms oo —> oo avoids the apparition 


of \fs as in the usual golfing scheme of j CPllj . Indeed, in our proof strategy, we have used the 
fact that ||rco||oo = ||sign(a;s)|| 0O = 1, whereas in (' 1*1 1 1111 2 = ||sign(a; 1 s r )|| 2 < y/s is involved. 
This is a key step in the proof, since the absence of the degree of sparsity at this stage allows to 
derive results depending only on S and not on its cardinality s = |5|. 


We denote by pi(£), p^{£) and pz(£) the probabilities that the upper bounds (Cl-£), (C2-£) 


and (C3-0 do not hold. 


Let us call ’’failure C” the event in which one of the 3 L inequalities (Cl-£), (C2-£), (C3-£) is 
not satisfied. Then, 


’ (failure C) < ^ P (failure (Cl-0) + P (failure (C2-£)) + P (failure (C3-£)) . 


e =1 

Therefore a sufficient condition for P (failure C) < e is Yle=iPiW + P 2 (^) + Pz{£) < e which 
holds provided that p\{£) < e/3 L, p- 2 {£) < e/3 L and p%(£) < e/3 L for every £ = 1,... ,L. By 
Lemma C.2, condition p\{£) < e/3 L is satished if 

mg. > 32T(5,7t) fin ^ • 


By Lemma C.3, condition P 2 (£) < e/3 L is satisfied if 

me > 101T(S', it) In 


i^f) 


By Lemma C.4, condition p%(£) < e/3 L is satisfied if 

me > 101T(5, 7 r) In 


(¥) 


Overall, condition 


me > 101T(5, 7 r) In 


(™) 


ensures that (26) and (27) are satisfied with probability 1 — e. Condition 

L 


rn = Y^me> 101 ( 7 ^/ 9 ) + 3 ) 7r) In (l2nLe x ) 


(28) 
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will imply (28). The latter condition can be simplified into 


m > 73 • T(S» ln(64s) ( In ( —^ + In In(64s) ) . 


(29) 


The latter condition ensures that the random vector v, defined by (25), satisfies Assumptions 


20 of Lemma A.l with probability larger than 1 — e. 


Hence, we have thus shown that if conditions (21), (22) and (29) are satisfied, then the 
Assumptions |TTJ| and 20 of Lemma A.l simultaneously hold with probability larger than 1 — 3e. 
Note that bound 


implies (21) and (22) 


B Bernstein’s inequalities 

Theorem B.l (Scalar Bernstein Inequality). Let x\,... ,x m be independent real-valued, zero- 
mean, random variables such that \x(\ < I\ almost surely for every i e {1,..., m}. Assume that 
E|xf | 2 < <Jf for l G {1,..., m}. Then for all t > 0, 


l=i 


xn 


> t < 2 exp — 


*72 A 
u 2 + Kt/3 J ’ 


with a 2 > Y^i =l a £- 


Theorem B.2 (Vector Bernstein Inequality (VI)). \CPlli Theorem 2.6] Let ( yk)\<k<m 


be a 


finite sequence of independent random complex vectors of dimension n. Suppose that E y k = 0 
and ||yfc ||2 < K a.s. for some constant K > 0 and set a 2 > ^ fc E||j/fe|||. Let Z = ||2/fcII 2 • 
Then, for any 0 < t < a 2 /K, we have that 


P {Z >t) < exp — 


(t/a - 1)" 


, t 2 1 
S “ P I “8^ + 4 


Theorem B.3 (Bernstein Inequality for self-adjoint matrices). Let ( Zk)i<k<n be a finite se¬ 
quence of independent, random, self-adjoint matrices of dimension d, and let ak be a sequence 
of fixed self-adjoint matrices. Suppose that Zk is such that EZ/ ; = 0 and ||Zfc|| 2->.2 < K a.s. 
for some constant K > 0 that is independent of k. Moreover, assume that E Z 2 A A| for each 
1 < k < n. Define 


a 2 = 


E A l 

k =1 


2—>2 


Then, for any t > 0, we have that 


E z * 

k =1 


>t < dexp ( — 


2—>2 


*72 \ 

a 2 + Kt/3 ) ' 


Proof. This result is as an application of the techniques developed in Pi o 12] to obtain tail 
bounds for sum of random matrices. Our arguments follow those in the proof of Theorem 6.1 
in [Trol2j . We assume that K = 1 since the general result follows by a scaling argument. Using 
the assumption that E Z? A A?, and by applying the arguments in the proof of Lemma 6.7 i 


m 


|Tro12j . we obtain that 

E exp (. 9Z k ) A exp (g(6)A 2 k ) , 

for any real 9 > 0, where g(9) = e e — 9 — 1, and the notation exp(A) denotes the matrix 
exponential of a self-adjoint matrix A (see |Trol2] for further details). Therefore, by Corollary 
3.7 in |Trol2| , it follows that 


E z * 

k= 1 


> t < 


d inf ( e -et+° 2 9(°)\ 
e> 0 l J 


(30) 


2—>2 
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where a 2 = ||^)]! =1 2 2 - To conclude, we follow the proof of Theorem 6.1 in |Trol2] , The 

function 9 i— > —Qt + a 2 g{9) attains its minimum for 6 = ln(l + t/a 2 ), which implies that the 
minimal value of the right-hand size of Inequality (30) is ciexp (— a 2 h(t/a 2 )) where h{u) = 
(1 + u) ln(l + u) — u for u > 0. To complete the proof, it suffices to use the standard lower bound 

h{u) - TTuj3 for u - ■ 


C Estimates: auxiliary results 

Let S be the support of the signal to be reconstructed such that |5| = s. We set 

A {s ' r) := 


Note that 


B k,S B k,S 


< 


2-S-2 


Bp. s Bk,s 


< ll^slloo-**, , therefore, 


oo—>-oo 

A(5,vr) < 0(5, tt). 


To make the notation less cluttered, we will write A, 0, T and T instead of A(5, n), 0(5, n), 
T(5,tt) and T(5,vr). 

Lemma C.l. Let 5 C be of cardinality of s. Suppose that 0 > 1. Then, for any 

5 > 0, one has that 


> S) < 2»exp ■ 

Proof. We decompose the matrix A* S A$ — Ps as 

m d* o m 

a- s as-ps=-y ; = 

m < 7r t m < 


(El) 


k =1 


77 Jk 


k =1 


where X & := 


B J k ,S B A,S 


77 Jk 


— Ps ) ■ It is clear that EX*, = 0, and since for all 1 < k < M, 


k,s 3 ~ >2 < A < 0, we have that 

7r fc — — ’ 


IX, 


k 112—5-2 I; 


< max 


B X,S B Jk,S 


2—>2 


11 Jk 


- 1,1 < 0 . 


Lastly, we remark that 


0 -< E Xi = E 


B X,S B Jk,S 
77 Jk 


— Ps P max 
l<k<M 


B t S B k,S 


2-S-2 


E 


P max 
l <k<M 

PQP S . 


^k 
B l,S B k,S 


B X,S B Jk,S 
77 Jk 


2 -s>2 


k 


Ps p A Ps 


Therefore, using Theorem |B .3 we can set a 2 = ||EfcLiIEXC^|| 2 ^. 2 — m ®- Hence, inequality (El) 
immediately follows from Bernstein’s inequality for random matrices (see Theorem B.3). ■ 

Lemma C.2. Let S C {1,..., n}, such that |5| = s. Let w be a vector in C n . Then, for any 
0 < t < 1, one has that 


f mt 2 1 \ 

' (II(A£As - P S ) w\\ 2 > t||«,|| 2 ) < exp + 


(E2) 
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Proof. Without loss of generality we may assume that ||to ||2 = 1. We remark that 


-.m,/ p* p x m 

(A* s As -H,) u* = - £ ( - Ps)w = 1 £ y k , 

777, ' \ 7 r r / m ^ 


fc=l 


T4 


m 


fe=l 


J3j 

where y k = ——--- Ps u> is a random vector with zero mean. Simple calculations yield 


that 




m 


Vk 


w 


m* 


< 2 

m z 


D* D \ 2 D* D > 

B J k ,S B Jk,S\ * B J k ,S B P,S * 

' w — 2w ——-u; + w w 


71 J k 


77 Jk 


D* D D* E> 

» * B J k ,S B Jk,S * B J k ,S B Jk,S 

Aw ——-u> — 2w ——- w + 1 


77 Jk 


77 Jk 


1 


m* 


= ~9 ( A - 2 ) W " 


B J k ,S B Jk,S 




w + 1 


k 


< i ((A - 2) A|Hi + 1) = 4 «A - 2) A + 1) 

m m 

< Aj (A - l) 2 < -4A 2 < Aj0 2 . 

m z m. m- 


Now, let us define Z = || — Y1T= 1 Z/fc|| 2 • By independence of the random vectors yk, it follows 
that 


E[Z 2 ] = —IE \\yi\\l = —E 
L J m m 






B J,S B J,s\ 2 \ \\B JS w\\l 

w,w ) — 2- £ + 1 


m 

= —E 

m \ \ TTj J / 7T J 

To bound the first term in the above equality, one can write 

2 


B J,S B J,S B *j,s B J,S \ / b j,s b J,S x , 

w, -—- w ) — 2 [ -—- w, w ) + {w, w) 




E 


B *h ,s b a ,s 




w, w 


= ( E 


B X ,S B P,S 


< A^E 


B X,S B Ji,S 

17 Jl 


KJl 
2 


w, w 


w,w)< A ||«;||2 < 0. 


One immediately has that E-——— = 11it?||| = 1. Therefore, one finally obtains that 


^fc 


0-1 © 
E [Z 2 \ < -< —. 


rri m 


Using the above upper bounds, namely || || 2 — m an< ^ ® [-Z’ 2 ] < ®, the result of the lemma 
is thus a consequence of the Bernstein’s inequality for random vectors (see Theorem B.2), which 
completes the proof. ■ 

Lemma C.3. Let S C {1,... ,n}, such that |5| = s. Let v be a vector of C n . Then we have 


P (P^AHloo > t\\v\\oo) < 4nexp - 


mt' 




T + Qt/3j ' 

Proof. Suppose without loss of generality that H^Hoo = 1. Then, 


(E3) 


I= max |(ej, A*Asv)\ = max — 

i i ^ iioo igSc in 0/1 ieS c m 


£ 

k= l 


B jBj k ,S 

ei, - 1 - v 

77 Jk 


25 







































Let us define Z k = ( e,;, Jk Jk,S v\ Note that E Z k = 0, since for i £ S c , E ( e*, Jk _ Jk ’ S v ) = 


nji. 


B* B, 


7r ^ 


e* Y2k =l fc fc,lS v = e iPsv = 0. From Holder’s inequality, we get 


^ B Z B k,s „, _ e * 

B X B Jk,S 


\z k \ = 


e«, —^- v 

71 Jk 

1 


* B X Bj k’ s 

e *—--u 


T4 


< max — ||-Bfc lS -B fc ej|| 
jG5 c 7Tk 1 

l<k<M 


< max —\\e j B k B k> s \\ 1 = Q. 

jes c 7f/, ^ - 1 

l<k<M 


Furthermore, 


E|Z fc | 2 =E 


< T. 


B j k B Jk,s 

ei, - v 

71 J k 


2 M 


E 

i=i 


\e*B*B e , s vf 

TT£ 


Therefore YXk=i^\Zk\ 2 < mT. Using 
complex random variables, we obtain 


real-valued Bernstein’s inequality 


B.l 


m 


the 


case 


of 


P 




< P 



< 4 exp 


mt 2 /4 \ 
T + 0t/3y ' 


Taking the union bound over i £ S c completes the proof. ■ 

Lemma C.4. Let S C {1,... ,n}, such that [S'! = s. Suppose that 0 > l.Let v be a vector of 
C n . Then we have 


P(ll(4s4s - B s) ^lloo > f|Mloo) < 4sexp • ( E4 ) 

Proof. Suppose without loss of generality that ||u||oo = 1. Then, 

IIWstAs ~ p s)v\\ 00 = max |(ei, (A* S A S - p s) u)| = max — 

i£& i£o 171 


m 

E 

k =1 




B X,S B Jk,S 
77 Jk 


- Ps V 


Let us define Z k = ( e*, 
we get 


B J k ,S B J k ’ S 

nji. 


— p s ) v }. Note that E Z k = 0. From Holder’s inequality, 


\Zk\ = 


dll 


B X,S B Jk,S 


KJk 


- Ps) V 


< 


B X,S B Jk,S 


KJk 


-Ps 


< max(0 — 1,1) < 0, 
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since \\B£ £-Ba:,s||oo->-oo £ ||oo—>-oo 5 and using the same argument as in Lemma 

thermore, 

2 


C.3 


Fur- 


nz k \ 2 =e 


= E 






B X,s B J k ,s 


77 J k 


- Ps)v 


B X ,S B Jk,S 


+ \ { e ii v )\ 


77 Jk 
2 


- (ei, v) E ( ei, Jk,S Jk ’ S v) - (ei,v)* E (a, Jk,S Jk ’ S v 


= E 

< T 


B X,S B Jk,S 
e i: - v 

77 J k 


- \(ei, v )\ 2 < E 


77 Jk. 


B X,S B Jk,S 

ei, - v 

77 Jk 


77 Jk 


2 _^\X B isBtsv 


l=i 


ng 


Therefore, X2'k=i^\Z k \ 2 < mT, and using real-valued Bernstein’s inequality B.l in the case of 
complex random variables, we obtain 


1 


m 


E 

fc=i 


B X,S B Jk,S 


77 Jk 


- Ps)v 


> t 


< p — 

\ m 


E Re 


k =l 


B X,S B Jk,S 


77 Jk 


- Ps)v 


>t/V 2 ) + p( — 

I \ m 


171 / / D* D 

/ ( B Jk,S B Jk,S 

^ VH L. ~ Ps 

k=i x v Jk 


( mt 2 /A \ 

£4exp l'T T&m)' 

Taking the union bound over i G 5 completes the proof. 

Lemma C.5. Let S be a subset of {1, ..., n}. Then, for any 0 < t < m, one has that 


( max ||Ag-Aej|| 9 > t ) < nexp 


\ieS' 


( 


\ 


- 1 


(E5) 


Proof. Let us fix some i 6 S c . For k = 1 ,,M, we define the random vector 

B X,S Bj k 

x k ■= - 2 - eg. 

77 J k 

Then, since i £ S c one easily gets Ex*, = YlfLi B \s B l e i = YlfLi ( BgPs )* Bga = Ps J2eL± BfBgei = 
Pge?; = 0 (note that Ps is self-adjoint). In addition, we can write 


I^5^ e *ll2 — 


m D* r> 

1 B Jk,S B Jk 

— > - ei 

m ir j, 

k=i k 


M 

-t 

m z ' 


Xk 


k =1 


Then, 


Ffc 2 = 


B X,S Bj k 


ei 


77 Jk 


< 


B X,S Bj k 


1 


< — B,4s 

- TtJ k " Jfc Jk ’ S] 


77 Jk 

< 0 . 


B X B A,s 


77 Jk 


Furthermore, one has that 


E ||z fc || 2 = E 


B X,S B Jk 


77 Jk 


< E 


B Jk,S 


yj 77 Jk 


2-S-2 


B Jk 


y/ 77 Jk 


< AE 


B Jk 


y/ 77 Jk 


= A11 ei 111 = A, 


< mA < m@. 


k =1 


>t/V‘. 
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Hence, using the above upper bounds, it follows from Bernstein’s 
(see Theorem B.2) that 


inequality for random vectors 


1 (\\A* s Aei \\ 2 >t)< exp 


/ 


V 


-1 


Finally, Inequality (E4) follows from a union bound over i £ S c , which completes the proof. 


D Proof of results in Applications 

D.l Proof of Corollary |4.1| 

The proof relies on the evaluation of 0 and T in the case of isolated measurements. In this case, 
we have n blocks composed of isolated measurements. Then, each block corresponds to one of 
the rows (a* k )i< k < n of A 0 . Recall that llafcO^slloo-Kjo = maxi<j< n sup^n^! \e*a k a* k S v\, so the 
norm ||afe a fc sIIoo-kxj is the maximum 1 1 -norm of the rows of the matrix a k a k s . Therefore, the 
quantities in Definition |3.1| can be rewritten as follows 


0(5,7r) := max 

l<k<n 


II OfcOfc. 


g || oo—>-oo 


= max 

1 <k<n 


< s ■ max 

1 <k<n 


\ a k 11oo II1 


7T k 

\n II2 

I a k 11 op 

Kk 

n 


1 


T(5,7r)= max sup > —|e*Ofcp \at 


l<i<n || V |i 


< sup 


n 

Y- 

* 7 Ti 


=<!fc=1 


Halloo 

7T/c 


\ a k,S v I" 


< 


II2 

oo 

sup max - 


E 

k =1 


l k,S 


sup \\P,sv \\2 max 

„|| 00 <l l<e<n 


2 _ Mi 


TTf 


< S 


max 
1 <k<n 


\n II2 

l a fc|loo 


(31) 


(32) 


v = 


sup ]\A 0 P s v\\l max 
«|L<1 l <*< n 


\ n 112 

lollop 

7T l 


Therefore we can choose T(5,7r) = s ■ maxi<fc< r 


II a k II 


^k 


and the result follows by Theorem 


3.3 


D.2 Around Corollary |4.3| 

D.2.1 Proof of Corollary |4.3| 

Again, this is all about evaluating 0 and T in this specific case. Concerning the evaluation of 
T, we can use the expression (32) to conclude that 


T(5,7t)= max sup —| e*a k \ 

1 ii ii ^ 7f^, 


l<i<n .. .. 


=<^=1 


21 * 12 
\ a k,S v \ * 
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To control 0, using (31), it suffices to write: 


, II OfcOfc glloo—>-oo H^fclloo ||^fc,5'||l 

7T) = max -< max - 

l<fc<n TTfz 1 <k<n 7F& 


II II || 

||oo £=1 W^kfotWocSi 


< max 

l<fc<n 




By Theorem 3.3, the two conditions 
m > C 




I X*=i^lKnJooKHoo\ , / \ • 

max ——-—- m(s). 

\ l<fc<n 7 Ffc J 

m > C ^ max sup —- |e*afc| 2 |a|. ln(s) In j 


lead to the desired conclusion. 


D.2.2 Comparison of Corollary 4.3 and the results in [ AHPRl3j . 


Note that the sampling in |AHPR13| is based on Bernoulli drawings structured by level. Their 
results are then easily transposable to the case of i.i.d. sampling with constant probability by 
level. The first condition on m in Corollary 4.3 is similar to condition (4.4) in Theorem 4.4 


of |AHPR13| . since we recognize the term 


\\ a k,Cl f || oo || a k || c 
k 


as the (k ,£)-local coherence dehned in 


|AHPR13| . Let us show that the second condition on m is similar to equation (4.5) in |AHPR13j . 
First, observe that 


max sup — |e-a fc |" < max sup ^ ~ IKfh ||^ \a% )S v 


1 




1 


= <! k= 1 ^ 


i<e<N ii^H 


D<! k=1 




1 


< max sup , 

Kt<N ||,,|i t—* 7Tfc 

— — |P||00^-L fo = \ ^ 


l^nJoc ll a fclloo 


a k,S v I 


Let v denote the maximizer in the last expression, and define s k = 
follows, 


for 1 < k < n. It 


max sup 




1 


<> k =i ^ 


\ e i a k\ \ak,S v \ < max 


E 


1 


1 <e<N z —' 7Tfc 

k= 1 


\ a k,nA 


\ a k lloo 


(33) 


l ■— \~~yn 

and X fe =i Sk = Xfc=i 


a k,s v 


= HAo-Ps^ll! = s £- The last inequality and 


Equation (33) for i.i.d sampling correspond to the condition (4.5) in Theorem 4.4 of [AHPR13] 
in the case of Bernoulli sampling. This completes the comparison between Corollary |4.3| and the 
results in [AHFR.13 ]. 


D.3 Proof of Corollary |4.4| 

Recall that (Qj) 0< j < j the dyadic partition of the set of indexes {l,...,n}. Recall also the 
function j : {1,..., n} —> {0,..., J} defined by j{u) = j if u £ f lj. In the interests of simplifying 
notation, in this section, the symbol will be equivalent to ’> C-\ with C a universal constant. 
The following lemma will be useful to bound above the coefficients of in absolute value, and 
to derive Lemmas ID.21 and ID.31 

Lemma D.l. \AHRlfa | / The magnitude of the coefficients of matrix Aq = , where J- is the 

ID Fourier transform and <f> is the ID Haar transform, satisfies 

HPn^oPnjLoo <2^2-l^l, for 0 (34) 


29 






































Lemma D.2. In the case of isolated measurements, with Aq = T<f>* with f to be the inverse 
ID Haar transform, suppose that the signal to reconstruct x is sparse by levels, meaning that 
II-Pq^Ho < Sj for 0 < j < J. Then, 


0 < max - 

1 <k<n 7T^ 


( j 

S j(k ) + XI S(2 

e=o 

\ e^j(k) 


\m-e 1/2 


(35) 


Choosing nk to be constant by level, i.e. the last expression can be rewritten as 

follows 


2~ j 

0 < max - 

0 <j<J 7 Tj 


( 


+ XI 


-\i-m 


£=0 


Proof. Using (31), we can write 


n ||tt/c||cx)||ttfc 5||l ll&fclloo n II Q'k.fle lloo^l 

= max - < max --- 


1 <k<n 7Tk 1 <k<n 

< max ]_ 2 -j{k)/2ST^ 2 -j(k)/2 2 -\j(k)-e\/2 

~ Kk<n 7 Tu 

- K 1=0 

J 


Kk 


Si 


< max -12-®V2 

1 <C h<f 71 'TT /. ‘ 


-\m-m 


l<fc<n 7Tfc 


Si, 


1=0 


(36) 


where we use (34) to bound above ||ajfc,n^ ||oo- ■ 

Lemma D.3. In the case of isolated measurements, with Aq = Tff with <f> to be the inverse Haar 
transform, suppose that the signal to reconstruct x is sparse by levels, meaning that ||-Pa,x||o 5; s j 
for 0 < j < J. Choosing 7Tfc to be constant by level, i.e. = 7 we have 


T < max —2 J XI 2 IJ ‘~ u ~ Sn • 




0 <j<J 7 Tj 


p =0 


Proof. Denoting v = v(i) the argument of the supremum in the definition of T, we get 

l a fc,fh 1 1001 


n 1 n j 

T := max —\e*a k \ 2 \a ktS v\ 2 < max XI — H°fe,nJlLl a fc,S^| 2 

I <i<n • -* 7 Ti. C\<f P<f T • 7T 


1 <l</) ^—* TTf. 

k =1 


0 <e<J 7 Tfc 

k =1 


J 


^ f < max £ Tr^'l £ |a s , s ®|= 

'■ 1 7=0 J 


fc=l 


fceo, 


= :iV, 


(37) 


We can rewrite Kj as follows Kj = WPq^AqPsvW^- Therefore, since ||u||oo < L 

J J 

y/Kj = \\PnjAoPsv\\2 = WP^Ap XI P^Ps^h < XI \\PnjAoPn p Psvh 

p =0 p=0 

j j 

< XI ll^j^oTh p || 2 -)- 2 ||T > OpT > 5 'i ;||2 < XI 

p =0 p =0 
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Using Lemma 4.3 of |AHR14a] . we have the following upper bound 

||Pn,A)Pn p || 2^2 < for 0 < j,p < J. 

Then, v /K] < Xp =0 2 ~ lj ~ pl/2 y/Sp, and thus 


k ^\J 2 2 


-]i-p|/ 2 


< 


^2-l^l/ 2 £ 2"^ 


-li-Pl/2, 


< 


K p=0 

^ 2 -b'-p|/2, 

\P=0 


\ p=0 


L P=0 


where in the second inequality we use Cauchy-Schwarz inequality. Therefore, 


T < max V 2“I j '“ £ I— 2~ j V 2 

0<t<J 4-^ 7T,' ^ 


-Ij-pI/2, 


3=0 


p=0 


< 

rs_/ 


max 

0<£<J 


5> |J - 


-1M[ 


j=o 


j ( max —2 J V 2 

/ v°- J - J ^ u 


< max —2 'y 2 ^ p ^ 2 s v 
~ 0<i<J TTj- y v 

J p=0 


Note that the upper bounds given in Lemmas |D.2| and |D. 3 coincide. Therefore, we can apply 
Theorem 3.3 with the following upper bound for T(S,ir) 


T(S,tt) < max —2 3 'S^ 2 


-U-Pl/ 2 


0 <j<J TTj 


Sp, 


p=0 


and conclude the proof for Corollary |4.4| 


D.4 Proof of Corollary |4.7| 

Recall that Aq = cj) <8> E C nxn , where cj) E is a ID orthogonal transform. Consider a 

blocks dictionary made of y^n horizontal lines, i.e. for 1 < A; < ySi 

B k = (4,i</>, • • •,4fc lV /«^) > and thus B k B k = x< .• 

Now, let us £x that the signal support S' is concentrated on (7 horizontal lines of the spatial 
plane. Formally, 

S C {(j - l)v/n + {1, • • •, \/n},j E J} (38) 


where J C {1,..., y/n} and \J\ = q. Therefore, 

B* k Bk,s = 


l< 2 ,j><\/n 


where Sj^j = 1 if j E J and 0 otherwise. In such a setting, the quantities in Definition 3.1 
be rewritten as follows: 


can 


rvc* \ W e *iB* k B kS \\\ SjeJ \^k,j\ HkMo 

= max max ---= max max - < max q -. 

l<fc<M KKn 7Ffc l<fc<v^ l<fc<-y/n 7Tfc 


(39) 
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Recall that 


M 

T(S, 7r) := max sup V] — \e*B* k B kS v\ 2 , 

1 <Cn<Cn ii ii ‘ -* 7T/. 




and call (i*,u) the argument of the supremum over {1,... , n} and {u, H^Hoo < 1}. Therefore, 


M 


1 


T(S, 7 T) = Y / -\e^B* k B k , s v\ 2 . 

k=l K 

We can decompose i* = ( i\ — 1 )y/n + Z 2 with i\, 12 integers of {1,, y/n}. We can write 


V n 

T(S,7T)=^- 

tt fe 


/c=i 


i=i 


5^—|4,nl 


fc=l 


3 =1 


where re E such that Wj = e* o u[j] and u[j] E is the restriction of v to the j-th horizontal 
line, i.e. to the components of v indexed by {(j — 1 )\fn + 1,..., j\/n}. We can rewrite the last 
expression as follows 

y/n y/n 

= y^—\(j)k,i 1 \ 2 \{e k ,(t)Pjw)\ 2 < max — |<^,u| 2 V \(e k , ^Pjw)\ 2 

^ ' 7r ' l<t<s/n 7T£ 


k =1 


^k 


k =1 
2ii D „,,||2 


= max —|^ il | 2 || 0 Pju ;||2 = max — 
l<£<y/n IX-^v/n TT^ 

< max — |^n| 2 -g, 
l<^<Vh 7T£ 

where in the last expression we use that H^Hoo < 1. Choosing (j) as the ID Fourier transform gives 
||^,:||oo = ^r /4 and choosing a uniform sampling among the y/n horizontal lines, i.e. ir £ = 1 /y/n 
for 1 < i < y/n, leads to 

F(*S', 7T*) < q, 


which ends the proof of Corollary 4.7 


D.5 Proof of Corollary |4.9| 

We recall that the sampling matrix is then constructed from the full sampling matrix Aq E C nxn , 
in the 2D setting, where Aq = J r 2 D^ 1 * with T 2 D £ C nxn the 2D Fourier transform and T* E C nxn 
the 2D inverse wavelet transform. Since both transforms are separable, J~ 2 d = J~ (g> J~, T = 
(g> ijj, with <8> the Kronecker product and T, /> E Cy/™ x y/™ the corresponding ID transforms. 
Then Aq can also be rewritten as Ao = <f> <g> <f>, the Kronecker product of the ID transforms 
(j) ■= B/>* E cVhxVh 


» 


the canonical 


l<z<n 


In this section, in order to avoid any confusion, we will denote by 
basis in dimension n. 

In Corollary 4.9, we focus on the case where Aq = cjxS) (j) E C nxn is the 2D Fourier-Shannon 
wavelet transform, then <f E C,y/™ X V™ is the ID Fourier-Shannon wavelets transform. Therefore, 
4> and Ao are block-diagonal orthogonal matrices. The sensing schemes are based on horizontal 
lines on the 2D plane, meaning that 


B k - [4>k,i4>- ■■<l ) k,y/n ( l 

for k = 1,, y/n. By defintion of the Fourier-Shannon transform, we have that 


B* k B k = 


k,£ < t , k,rrJ- dyH 


l<i,m<y/n 2^k) 


m 




l<£,m<y/n 
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for k = 1,..., yfn, where S£ £Tj = 1 if £ £ Tj. and 0 otherwise. 

First let us start with the evaluation of 0. By definition of || • ||oo^oo ; we have 


|-Bfc£fc,s||oo—>oo = max sup (e^) B* k B k P s v 

\<l<n ||«|| 00 <i V / 




Setting v = v(k) the argument of the supremum in the last expression, then 


0 := max max — 
l<k<y/n 7 T k 


, 0 ) 


B* k B k P s v 


Note that ||h||oo < 1- The index i can be rewritten as Z = (£i — l)y / n + ^2, with 1 < Z\, £2 < y/n. 


0 := max max — 

l<fc<Vn l<h,l 2 <sfn TT k 

1 

= max max — 
1 <k<y/n l<i\,l 2 <y/n k 


0%a (0k 


Xmye^) 


_P S V 


V ,L _ 

01 /1 ( e ^ } ) ( p sv) [m] 


m= 1 


where (n) [m] £ is the restriction of the vector v to the m-th horizontal line, i.e. to 

the components indexed by {( m — 1 )y/n + 1,... ,my/rt}. Set w;IM := ( Psv ) [m] £ CV™, the 
restriction of Psv to the m-th horizontal line. Then the t^-th component of w^ m \ written as 

w^ m ' > < 1 if (m — 1 )y/n + Z 2 £ S, and it is 


w 


is equal to ( p sv) [m]. Note that 

equal to 0 otherwise. Then, 


0 < max max — 


<t>l. 


tl 


\/n 

E 

m= 1 




IM 

'/a 


(40) 


By the properties of block-diagonality of the Fourier-Shannon transform, we have 


0 < max max — 
l<k<y/n l<h,i 2 <y/n 7Tfc 


< max max — 

1 <k<y/n l<l 2 <\/ri. 7 T k 


< max max — 

l<k<y/n 1 <l2<\fn 7T k 


C 1 1 c 

5- max- TT\ s ilk\- 

1 <k<y/n 7Tfc > 


01/ 1 0^ 

PZl^Tj^k) 


1.W 


IM 

ii 


(41) 


112 
lloo 


II 2 

V lloo 


E 

m£T jW 

E 


,IM 

ii 


,IM 

A 


m£T, 


’i(*0 


(42) 


m£T j(k) 


W 


IM 

A 


is bounded above by Ylme-r- k ^{m-\)sfn+(, 2 ^Si which counts the number 


Indeed, 

of intersections between S, the t^th-column and the j ( k ) (horizontal) level, see the blue line in 
Figure [3| Taking the maximum over 1 < £2 < \fn leads to ^2 meT . (k) ^m.-i)y/h+e 2 eS ^ s< j(k)- 


Secondly, let us evaluate T. We have that 


T := max — (ei B k B k v 
l<l<n ^ 7Tfc V L ) k 
- k =1 L 
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where v = v(£) is the argument of the supremum on the l 00 unit-ball. 
{l\ — l)y/H + i 2 , we can rewrite 


Using (41), with i = 


yjn 

T = max V — 


E ^ 


rn^£ 2 


|(m) 


m =1 


where : = (Psh) [m\. Note again that < 1 if (m — 1 )y/n + £2 £ S, and it 


is equal to 0 otherwise. By denoting w the vector with components 


w \{./2) . = ( w \iX) 5 w \(2) , 


we can rewrite the previous quantity as follows 

\fn 


T = 


max y~ (j)* kA U* k w^A 


TTfe 


= max V-I^il 
1<€i/ 2<V^“ TTfc 




10 ,&) 


Since <j) is an orthogonal block-diagonal transform, we have 


T = max y — | (f>k. 
\<iiM<yfn nk 

fce U(q) 


ill 




I0A) 


(43) 


(44) 


Choosing ir k = kj for k E Tj meaning that the probability of drawing lines is constant by levels, 
we can write that 


T = max -- y \(/) kA \ 

; k ^ T m) 


<Pk,:> w 


I0A)\ 


< max 


— E 

i<ti* 2 <Vn ir m) fee f-y 


I 2 

loo 




< max —- E /<y, 


\<hM<sfr *j(h) k £ 


/ 


= max 


Kh) 


l<h/ 2 <Vn TTj(£ 1 ) 

Since <j) is orthogonal and block diagonal we have 

2 _ i(U) 


P r m) W^ ,i2) _ = l|Pr i „ 1 ^ l(:,£2) |||. Then, 


T < 


max 


< 


l<<lA<v/S TTj(ii) 


Pr-„ 

'3(h) 


j(h) 


max 


l<h<Vn T^j(i 1 ) 


s i(U)’ 


(45) 


where the last step invokes that \\P Tm) w^ \\\ < S (m-i)Vn+heS < ^(4)' Note that 

the upper bounds (42) and (45) on T and 0 coincide. They lead to the following choice for 
1 < k < y/n, 


^j(k) 


b m 


2~f( fc ) 


b m 


2~f( fc ) 




2~f( fc ) 
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Then for this particular choice, we can rewrite 


max 


(0,T)<£ 
j =0 


s r 


To conclude, by Theorem 3.3 a lower bound on the required number of horizontal lines to 
acquire is thus 

J 

m > E Sj ln(s) In(n/e). 
i =o 


D.6 Proof of Corollary |4.10 


In this part, using the formalism introduced in the last section, if; is the ID Haar transform, and 


(j) is then the Fourier-Haar’s wavelet transform. In such a case, we can reuse (40) in Section D.5 
to evaluate 0: 


0 < max max — 
i<k<VHi<h,e2<VH 7T k 


y/n 


4>k,t i ^ <t>k [^ 2 ] 


m=l 


Using Lemma D.l we have for 1 < k, m < y/n, 

14,m| < 


Therefore, 


IW i| 

171=1 


0 < max max — 

l<k<y/nl<h,i 2 <y/n T^k 

1 J 

— — v — ~ v j=() m£Tj 

1 J 

< max max — 1 6t f I > > I d>k 

- i<t<vai<iuk<van 1 ' 1 ^ 

J 


,m I 




,IM 

V 2 


|(m) 

<2 


< max max T 2 -J'(*) y 2“I J W- J 'I/ 2 V 

l<fc<Vnl<fe<VS ^ 

1 J 

< max 42-f( fc )\^2 

l<fc<Vn VTfc yj 


I I ( m ) 

K 


i-b'(fc)-j'l/2 s c_ 


(46) 


Now let us study T. Recall the definition of w K : A) depending on ^2 in (43), we can reuse 
(44) to have 


^ 1 / 

T = max V- 4>kA\^ w 


JOA) 




= max y — y 


,I0A) 


’Mi | 
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by choosing it % = Tj for k € Tj , meaning that the drawing probability is constant by level. Since 
b* k ^ < by Lemma 


for k € Tj , we have 


D.l 


Then, 


T = max —2 3 2 ^ AL)| 


l<tl,(-2<V™ j =Q n j 


Dealing with I\j, we can derive that 


/c£t 


6£ : ,w l ^ 2) 


~K, 


\^Kj = P T ^*W^ 


P r,TT_Pr, 


w 


10 / 2 ) 


r=0 


< E ll F ^’ F 


Tr 11 2—>-2 


r=0 


P TrU) l(^A) 


< y 2~l- 7- ’’I/ 2 


r=0 


where the upper bound 110*11 2 _>.2 ~ ~ ^ r ^ 2 can f° un d ™ |AHR14al Lemma 4.3]. 
Then, 


K k< l^- b ~ rl/ 2 V^r) < E 2 


?-b'-r|/2 


E 2~ lj ~ rl/2 s c r 


\r=0 


\r =0 


\r =0 


Therefore, 


< y 2 _|j_r|/2 .s c 




r =0 


T < max E — 2 - j 2 -I j -AL)| y 2~\ j ~ r \/ 2 s c r 




r=0 


< 


max 


i<<i<VSp 0 


^2-1^(411 ] f 


2-J x - 

max 2 IJ '''“s; 


)~li- F l/ 2 c c 


\ o<j<J 7 r,- 

\ — 7 — J r*—I 


r=0 


2-J 


< max -V 2-l J - r l/ 2 s^. 

o<i<J Tr,- ^ 


(47) 


The upper bounds (46) and (47) give 


2~ 3 


max 


(0, T)< max -V 2 ^ r ^ 2 s^.. 

0 <j<J 7Tj ^ 

J r =0 


Therefore, by Theorem 3.3, a lower bound on the required number of horizontal lines is 


m > max 


0 <j<J 7Tj *— 

— ■> r = l 


y 2 r ^ 2 s c r ln(n/e) ln(s). 


r =0 


By choosing 


^j(k) 


2-i( fc )£;Lo2- | 7m-H/2 s c 

vy 2 -iW V J n 2-bW-d/V 
Z^f=l ^ Z-/r=0 r 
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for 1 < k < y/n, the lower bound on the required number of horizontal lines can be rewritten as 

Vn J 

m > ^ ^ 2 _ b’W _r l/ 2 s ^ . hr(n/e) ln(s) 

£=1 r=0 

J J 

*EE^E 2 ^ r ^ 2 s c r • ln(n/e) ln(s) 

j=0 r=0 


J J 

2 ~ lj ~ rl/2s r ■ ln(n/e) ln(s) 

j= 0 r=0 


> 


J ( J 

S j + Yl 2_lj ” r|/2s r I • In (n/e) ln(s), 


i=o 


V 


r=0 


which concludes the proof of Corollary 4.10 
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